{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Translation Explanations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates model explanations for a text to text scenario using a pretrained transformer model for machine translation. In this demo, we showcase explanations on two different models: English to Spanish (https://huggingface.co/Helsinki-NLP/opus-mt-en-es), and English to French (https://huggingface.co/Helsinki-NLP/opus-mt-en-fr)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n",
"\n",
"import shap"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## English to Spanish model"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# load the model and tokenizer\n",
"tokenizer = AutoTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-en-es\")\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(\"Helsinki-NLP/opus-mt-en-es\").cuda()\n",
"\n",
"# define the input sentences we want to translate\n",
"data = [\n",
" \"Transformers have rapidly become the model of choice for NLP problems, replacing older recurrent neural network models\"\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Explain the model's predictions"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.\n",
"To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)\n"
]
}
],
"source": [
"# we build an explainer by passing the model we want to explain and\n",
"# the tokenizer we want to use to break up the input strings\n",
"explainer = shap.Explainer(model, tokenizer)\n",
"\n",
"# explainers are callable, just like models\n",
"shap_values = explainer(data, fixed_context=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize shap explanations"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
"